Valkeydependency #876

hpopuri2 · 2026-01-27T17:11:03Z

Add Valkey Distributed Cache for Horizontal Scaling

Summary

This PR implements distributed caching using Valkey to enable horizontal scaling of Trino Gateway. Multiple gateway instances can now share query metadata through a distributed cache layer, ensuring consistent query routing across all
instances.

Motivation

Currently, Trino Gateway uses local Guava caches that are not shared between instances. In multi-instance deployments, this can lead to:

Inconsistent query routing when requests hit different gateway instances
Cache misses requiring expensive database lookups
Inability to leverage cache across horizontally scaled deployments

This implementation addresses these limitations while maintaining backward compatibility and graceful degradation.

Architecture

3-Tier Caching Strategy

Request Flow:

L1 Cache (Local Guava) → ~1ms
├─ Hit: Return immediately
└─ Miss: Check L2
L2 Cache (Valkey Distributed) → ~5ms
├─ Hit: Populate L1, return
└─ Miss: Check L3
L3 Cache (PostgreSQL Database) → 50ms
├─ Found: Populate L2 + L1, return
└─ Not Found: Search backends via HTTP (200ms)

Cache Keys

trino:query:backend:{queryId} - Backend URL for query
trino:query:routinggroup:{queryId} - Routing group for query
trino:query:externalurl:{queryId} - External URL (lazy-loaded)

Implementation Details

Core Components

ValkeyConfiguration (gateway-ha/src/main/java/io/trino/gateway/ha/config/ValkeyConfiguration.java)

11 configurable parameters with sensible defaults
Input validation (port range, positive values, health check intervals)
Convention over Configuration - only 3 params required (enabled, host, port)

ValkeyDistributedCache (gateway-ha/src/main/java/io/trino/gateway/ha/router/ValkeyDistributedCache.java)

Implements DistributedCache interface
JedisPool connection pooling with configurable pool size
Modern Duration API (no deprecated methods)
Graceful degradation when disabled or unhealthy
Periodic health checks (PING) with configurable interval
Metrics tracking: hits, misses, writes, errors, hit rate

DistributedCache Interface (gateway-ha/src/main/java/io/trino/gateway/ha/router/DistributedCache.java)

Clean abstraction for cache operations (get, set, delete, isHealthy)
Enables future alternative implementations

Integration

BaseRoutingManager - Updated routing logic:

Write-through caching for backend and routing group (cache on write)
Lazy-loading for external URLs (cache on first read)
Cache key documentation with Javadoc
Graceful fallback to database when cache unavailable

HaGatewayProviderModule - Dependency injection:

Provides DistributedCache singleton
Wires ValkeyConfiguration to ValkeyDistributedCache

Configuration

Minimal (Recommended for Getting Started)

  ```yaml
  valkeyConfiguration:
    enabled: true
    host: localhost
    port: 6379
    # password: ${VALKEY_PASSWORD}  # Optional: if AUTH required
    # cacheTtlSeconds: 1800  # Optional: Cache TTL (default: 1800 = 30 minutes)

Advanced (Production Tuning)

valkeyConfiguration:
  enabled: true
  host: valkey.internal.prod
  port: 6379
  password: ${VALKEY_PASSWORD}
  database: 0
  maxTotal: 100              # More connections for high concurrency
  maxIdle: 50
  minIdle: 25
  timeoutMs: 5000            # Longer timeout for slower networks
  cacheTtlSeconds: 3600      # 1 hour for long-running queries
  healthCheckIntervalMs: 60000  # 1 minute health checks

Single Instance (No Changes Required)

valkeyConfiguration:
enabled: false # Default - local cache sufficient

Testing

Unit Tests (31 total, all passing)

TestValkeyConfiguration (16 tests)

Default values verification
Setter/getter correctness
Input validation (port range, positive values, etc.)
Edge cases (null password, various database indices)

TestValkeyDistributedCache (15 tests)

Disabled cache behavior (returns empty, fails gracefully)
Invalid host handling (marks unhealthy)
Close operations (safe cleanup)
All cache operations (get, set, delete with variations)
Null/empty array handling
Password and database configuration
Pool configuration acceptance

Integration Tests (existing tests updated)

All routing manager tests updated with distributed cache
NoopDistributedCache for tests not requiring real cache
No regression in existing functionality

Documentation

Comprehensive documentation added:

New File: docs/valkey-configuration.md (273 lines)

Quick start with minimal config
Full configuration reference
Deployment scenarios (single vs. multi-instance)
Performance tuning guidelines
Connection pool sizing recommendations
Monitoring and troubleshooting
Security checklist
Architecture explanation
Migration guide
FAQ

Updated Files:

docs/installation.md - Added "Configure distributed cache" section
docs/operation.md - Added "Multi-instance deployments" monitoring section
mkdocs.yml - Added navigation entry for Valkey docs

Backward Compatibility

✅ Fully backward compatible

Disabled by default (enabled: false)
No changes required to existing configs
Single-instance deployments work exactly as before
Existing tests pass without modification (after updating constructors)

Migration Path

From Single to Multi-Gateway

Deploy Valkey server
Update config.yaml on all gateways:
valkeyConfiguration:
enabled: true
host: valkey.internal
port: 6379
password: ${VALKEY_PASSWORD}
Rolling restart gateways
Monitor cache hit rates

No data migration needed - cache populates automatically.

Graceful Degradation

When Valkey is unavailable:

✅ Queries continue working
✅ Falls back to database lookups
✅ Logs warnings (not errors)
✅ Marks cache as unhealthy
✅ Auto-recovery when Valkey returns

Monitoring

Cache metrics available via ValkeyDistributedCache:

getCacheHits() - Total cache hits
getCacheMisses() - Total cache misses
getCacheWrites() - Total successful writes
getCacheErrors() - Total operation errors
getCacheHitRate() - Hit rate percentage (0-100)

Future work: Expose these via /metrics endpoint for Prometheus.

Dependencies

Added: io.valkey:valkey-java:5.5.0

Valkey is a Redis fork with compatible protocol
Works with both Valkey and Redis servers
Apache 2.0 licensed

Files Changed

New Files (7)

gateway-ha/src/main/java/io/trino/gateway/ha/config/ValkeyConfiguration.java
gateway-ha/src/main/java/io/trino/gateway/ha/router/DistributedCache.java
gateway-ha/src/main/java/io/trino/gateway/ha/router/ValkeyDistributedCache.java
gateway-ha/src/test/java/io/trino/gateway/ha/config/TestValkeyConfiguration.java
gateway-ha/src/test/java/io/trino/gateway/ha/router/NoopDistributedCache.java
gateway-ha/src/test/java/io/trino/gateway/ha/router/TestValkeyDistributedCache.java
docs/valkey-configuration.md

Modified Files (16)

Configuration: gateway-ha/config.yaml, docs/config.yaml
Core: HaGatewayConfiguration.java, HaGatewayProviderModule.java, BaseRoutingManager.java
Routing: QueryCountBasedRouter.java, StochasticRoutingManager.java, ProxyRequestHandler.java
Tests: 4 routing manager tests updated
Docs: installation.md, operation.md, mkdocs.yml

Checklist

Code follows project style guidelines
No deprecated APIs used (Duration instead of milliseconds)
Comprehensive input validation
Unit tests written and passing (31 tests)
Integration tests updated and passing
Documentation complete and accurate
Configuration examples provided
Backward compatible (disabled by default)
Graceful degradation implemented
Security considerations documented
Performance tuning guidelines included

Future Enhancements

Expose cache metrics via /metrics OpenMetrics endpoint
Add TLS/SSL support for Valkey connections
Support Redis Cluster mode
Add cache warming on startup
Implement cache eviction strategies
Add circuit breaker pattern for cache failures

Testing Instructions

Local Testing (Single Instance)

No changes needed - works as before

java -jar gateway-ha.jar config.yaml

Multi-Instance with Valkey

Start Valkey

docker run -d -p 6379:6379 valkey/valkey:latest

Update config.yaml

valkeyConfiguration:
enabled: true
host: localhost
port: 6379

Start multiple gateways

java -jar gateway-ha.jar config.yaml

Verify Cache Working

This commit adds Valkey (Redis-compatible) distributed cache support to enable horizontal scaling of Trino Gateway across multiple instances. Key features: - 3-tier caching: Guava (local) -> Valkey (distributed) -> PostgreSQL (persistent) - Graceful degradation if Valkey is unavailable - Connection pooling with health checks - Configurable TTL and connection parameters - Comprehensive documentation New files: - ValkeyConfiguration.java: Configuration class - DistributedCache.java: Cache interface - ValkeyDistributedCache.java: Valkey implementation - NoopDistributedCache.java: Test implementation - docs/valkey-configuration.md: Complete documentation Modified files: - Integrated cache into routing managers - Updated dependency injection - Added Jedis dependency - Updated configuration files - Updated tests

cla-bot bot added the cla-signed label Jan 27, 2026

hpopuri2 force-pushed the valkeydependency branch from 7dfb225 to fcd625b Compare January 27, 2026 17:12

hpopuri2 force-pushed the valkeydependency branch from fcd625b to 17ae63d Compare January 27, 2026 17:26

hpopuri2 closed this Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valkeydependency #876

Valkeydependency #876

Uh oh!

hpopuri2 commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Valkeydependency #876

Valkeydependency #876

Uh oh!

Conversation

hpopuri2 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Valkey Distributed Cache for Horizontal Scaling

Summary

Motivation

Architecture

3-Tier Caching Strategy

Cache Keys

Implementation Details

Core Components

Integration

Configuration

Minimal (Recommended for Getting Started)

No changes needed - works as before

Start Valkey

Update config.yaml

Start multiple gateways

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

hpopuri2 commented Jan 27, 2026 •

edited

Loading